The Kolmogorov Sampler

نویسندگان

  • David L. Donoho
  • Robert Gray
  • Iain Johnstone
  • Amir Najmi
  • David Neuhoff
  • Donald Ornstein
  • Paul Shields
  • David Stork
چکیده

Given noisy observations Xi = θi + Zi, i = 1, . . . , n, with noise Zi iid ∼ N(0, σ), we wish to recover the signal θ with small mean-squared error. We consider the Minimum Kolmogorov Complexity Estimator (MKCE), defined roughly as the n-vector θ̂(X) solving the problem min Y K(Y ) subject to ‖X − Y ‖2l2n ≤ σ 2 · n, where K(Y ) denotes the length of the shortest computer program that can compute the finite-precision n-vector Y . In words, this is the simplest object that fits the data to within the lack-of-fit between θ and X that would be expected on statistical grounds. Suppose that the θi are successive samples from a stationary ergodic process obeying Eθ 1 < ∞. Under a regularity condition on the process, we show that θ̂ behaves, for large n, as a single sample from the posterior L(θ|X). In a sense, then, θ̂ is a universal empirical Bayesian procedure. Parallel results hold in other settings. Suppose we have binary data Xi which arise as Xi = θi ⊕ Zi, where each unknown θi is binary-valued, and ⊕ denotes the exclusive-or of Bernoulli noise and θ. Define MKCE in this setting as the Kolmogorov-simplest binary string matching the observed data except in n places. Suppose that θ follows a stationary ergodic process. Under a regularity condition on the process, we show that the MKCE behaves as a sample from the posterior distribution. Our proof combines ideas from information theory to get what seems an essentially novel conclusion: that when the loss level is equal to the true underlying noise level, optimal lossy compression of random process data must exhibit reconstructions which look like samples from the correct Bayesian posterior. Our proof relies on a key idea from ergodic theory, namely the concept of B-process; this furnishes the ‘regularity condition’ on stationary ergodic processes mentioned earlier. We also rely on d̄ and ρ̄ distances. Structure theorems about the class of B-processes allow us to conclude that ρ̄-close to every B-process are other B-processes where an optimal codebook for Shannon lossy compression can be computed by a short program. Since codebook performance is ρ̄-continuous, the Kolmogorov-simplest representation matching the noisy data to within the noise level must compress at least as well as optimal Shannon compression for such processes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fleming–viot Process and Bayesian Nonparametrics

This paper provides a construction of a Fleming–Viot measure valued diffusion process, for which the transition function is known, by extending recent ideas of the Gibbs sampler based Markov processes. In particular, we concentrate on the Chapman–Kolmogorov consistency conditions which allows a simple derivation of such a Fleming–Viot process, once a key and apparently new combinatorial result ...

متن کامل

بررسی آزمایشگاهی کارایی نسبی تله‌های رسوب‌گیر ریزگرد MWAC، MDCO،CDS و CDSC

Quantitative measurement of aeolian dust may help properly monitor and control the wind erosion. The aim of this study was to evaluate the efficiencies of four aeolian dust samplers including the modified Wilson and Cooke sampler (MWAC), cyclone dust sampler with cone (CDSC), cyclone dust sampler (CDS), and marble dust collector (MDCO) in comparison with the big spring number eight sampler (BSN...

متن کامل

ارزیابی عملکرد ریز نمونه بردار تله سوزنی حاوی جاذب کامپوزیت سیلیکاته نانولوله کربنی جهت نمونه برداری از پرکلرواتیلن : مطالعه آزمایشگاهی و میدانی

Background and aims: Needle trap micro-sampler, with all advantages of microextraction techniques such as solid phase microextraction, owing to the ease of use and becoming a powerful method for air pollution monitoring. In this study, the performance of needle trap micro-sampler device with silica composite of carbon nanotubes, prepared using sol-gel technique, was investigated for health moni...

متن کامل

The effects of abnormal blood pressure on arterial sampler filling times.

BACKGROUND Sampler filling time begins with the initial flash of blood in the needle hub until the preset sampler volume is obtained. Previous studies have shown statistically significant differences between arterial and venous sampler filling times, but included only a few subjects with abnormal blood pressures. OBJECTIVE To determine whether the time required to fill a vented arterial sampl...

متن کامل

Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models

The Gibbs sampler is a particularly popular Markov chain used for learning and inference problems in Graphical Models (GMs). These tasks are computationally intractable in general, and the Gibbs sampler often suffers from slow mixing. In this paper, we study the SwendsenWang dynamics which is a more sophisticated Markov chain designed to overcome bottlenecks that impede the Gibbs sampler. We pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002